Extraction for Frequent Sequential Patterns with Minimum Varaible-Wildcard Regions

نویسندگان

  • Tomoyuki Kato
  • Hajime Kitakami
  • Makoto Takaki
  • Keiichi Tamura
  • Yasuma Mori
  • Susumu Kuroki
چکیده

A new methodology for extracting all frequent sequential patterns with minimum variable-length wildcard regions from sequence databases in order to extract candidates of a motif from amino acid sequences is proposed. A scope database defined by the k-length pattern consists of not only the projected database including the start position of a scan but also the range of the scan and occurrences corresponding to evidence for the pattern. The scope database makes it possible to avoid the construction of the variable-length wildcard region that is too large to explain occurrences corresponding to evidence for each (k+1)-length pattern. Moreover, redundancy is also eliminated for the set of solutions using the scope database. Furthermore, the prototype has been applied to the evaluation of a dataset that includes the Leucine Zipper motif. Our method resulted in a high capability to extract non-redundant sequential patterns including minimum variable-wildcard regions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

WildSpan: Efficient Discovery of Functional Motifs Spanning Large Wildcard Regions from Protein Sequences

Motivation: Automatic extraction of motifs from biological sequences is an important problem in molecular biology. For proteins, it is desired to discover sequence motifs containing large irregular gaps as the contact residues associated with a functional site are not always from one region of the sequences. Discovering such patterns is a time-consuming task due to a large number of combination...

متن کامل

Does Fundraising Have Meaningful Sequential Patterns? The Case of Fintech Startups

Nowadays, fundraising is one of the most important issues for both Fintech investors and startups. The pattern of fundraising in terms of “number and type of rounds and stages needed” are important. The diverse features and factors that could stem from Fintech business models which can influence success are of the key issues in shaping these patterns. This study applied the top 100 KPMG Fintech...

متن کامل

Hybrid Technique for Frequent Pattern Extraction from Sequential Database

Data mining has became a familiar tool for mining stored value from the large scale databases that are known as Sequential Database. These databases has large number of itemsets that can arrive frequently and sequentially, it can also predict the users behaviors. The evaluation of user behavior is done by using Sequential pattern mining where the frequent patterns extracted with several limitat...

متن کامل

High Fuzzy Utility Based Frequent Patterns Mining Approach for Mobile Web Services Sequences

Nowadays high fuzzy utility based pattern mining is an emerging topic in data mining. It refers to discover all patterns having a high utility meeting a user-specified minimum high utility threshold. It comprises extracting patterns which are highly accessed in mobile web service sequences. Different from the traditional fuzzy approach, high fuzzy utility mining considers not only counts of mob...

متن کامل

Extracting Feature Sequences in Software Vulnerabilities Based on Closed Sequential Pattern Mining

Feature Extraction is significant for determining security vulnerabilities in software. Mining closed sequential patterns provides complete and condensed information for non-redundant frequent sequences generation. In this paper, we discuss the feature interaction problem and propose an efficient algorithm to extract features in vulnerability sequences. Each closed sequential pattern represents...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006